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Abstract. Coron is a domain and platform independent, mult i- purposed 
data mining toolkit, which incorporates not only a rich collection of data 
mining algorithms, but also allows a number of auxiliary operations. To 
the best of our knowledge, a data mining toolkit designed specifically for 
itemset extraction and association rule generation like Coron does not 
exist elsewhere. Coron also provides support for preparing and filtering 
data, and for interpreting the extracted units of knowledge. 
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1 System Overview 

Born for a particular need in a cohort study [1], CORON is now a framework 
of knowledge discovery in databases on its own, used in several application do- 
mains, e.g. [4-6]. Intended to an educational and scientific usage, the Coron 
system is articulated into several modules for preparing and mining binary data, 
and filtering and interpreting the extracted units. Thus, from binary data (pos- 
sibly obtained from a discretization procedure), CORON allows one to extract 
itemsets (frequent, closed, generators, etc.) and then to generate association 
rules (non-redundant, informative, etc.). Building concept lattices is also pos- 
sible. The system includes many classical algorithms of the literature, but also 
others that are specific to Coron [9-11]. The software is freely available at 
http://coron.loria.fr. Mainly written in Java, CORON is compatible with 
the Unix, Mac and Windows operating systems and is of command- line usage. 

2 A Global Data Mining Methodology 

The methodology was initially designed for mining biological cohorts, but it is 
generalizable to any kind of database. It is important to notice that the whole 
process is guided by an expert, who is a specialist of the domain related to 
the database. His role may be crucial, especially for selecting the data and for 
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interpreting the extracted units, in order to fully turn them into knowledge units. 
In our case, the extracted knowledge units are mainly association rules. At the 
present time, finding association rules is one of the most important tasks in data 
mining. Association rules allow one to reveal "hidden" relationships in a dataset. 
Finding association rules requires first the extraction of frequent itemsets. 

The methodology consists of the following steps: Definition of the study 
framework; Iterative step: data preparation and cleaning, pre-processing step, 
processing step, post-processing step; Validation of the results and Generation 
of new research hypotheses; Feedback on the experiment. The life-cycle of the 
methodology is shown in Figure 1. Coron is designed to satisfy the present 
methodology and offers all the tools that are necessary for its application in 
a single platform. 



Pre-processing. These modules propose several tools for manipulating and for- 
matting large data. The data are described by binary tables in a simple text-file 
format: some individuals in lines possess or not some properties in column. The 
main possible operations are: (i) discretization of numerical data, (ii) conversion 
of different file formats, (iii) creation of the complement of the binary table, and 
(iv) other projection operations such as transposition of the table. 




Fig. 1. Architecture of the Coron System 
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Data mining. Extracting itemsets and association rules is a very popular task 
in data mining. Concept lattices are mathematical structures supported by a 
rich and well established formalism, namely, Formal Concept Analysis [13]. A 
concept lattice is represented by a diagram giving nice visualization of classes of 
objects of a domain. Thus, the data mining modules of the Coron System offer 
the following possibilities: 

— Itemset extraction: frequent, closed, rare, generators, etc. This task is per- 
formed by a large collection of algorithms based on different search strategies 
(depth-first, level-wise, etc.). 

— Association rules generation: frequent, rare, closed, informative, minimal 
non-redundant, Duquenne-Guigues basis, etc. These rules are given with 
a set of measures such as support, confidence, lift, conviction, etc. 

— Concept lattice construction. 

Post-processing. Extracted units from the data mining step may be very numer- 
ous, and hide some units of higher interest. Thus, CORON proposes some filtering 
operations that should be done in interaction with a domain expert. The ana- 
lyst may filter rules w.r.t. the length of its components, and/or the presence of a 
given property. He may also retain the k best extracted units w.r.t. a measure of 
interest. It is also possible to color some properties of a list of association rules. 

Toolbox. Finally, auxiliary modules allow one to visualize equivalence classes of 
itemsets, randomly generate binary data, etc. 

3 Applications 

CORON has been used for the following tasks: extraction of knowledge of adapta- 
tion in case-based reasoning [4], gene expression data analysis [5, 12], information 
retrieval [7], recommendations for internet advertisement [6], biological data in- 
tegration [8], and finally, cohort studies [1]. 

4 Work in Progress 

Currently, we are studying how to integrate CORON in platforms using graphi- 
cal data- flows, such as Knime [2], whose popularity is increasing (http://www. 
knime.org). This would allow CORON to interact with many other useful tools, 
most importantly avoiding a command-line usage. Also, other tools will be in- 
tegrated in Coron to consider complex data, mainly numerical, see e.g. [12]. 
Finally, we have recently set up a forum to gather questions, comments and 
suggestions from CORON users (http://coron.loria.fr/forum/). 

In this paper, we have given a brief overview of the Coron System. For more 
details, please refer to the project's website at http://coron.loria.fr. 
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